我们考虑主人想要在$ n $ Workers上运行分布式随机梯度下降(SGD)算法的设置,每个算法都有一个数据子集。分布式SGD可能会遭受散乱者的影响,即导致延迟的缓慢或反应迟钝的工人。文献中研究的一种解决方案是在更新模型之前等待每次迭代的最快$ k <n $工人的响应,其中$ k $是固定的参数。 $ k $的价值的选择提供了SGD的运行时(即收敛率)与模型错误之间的权衡。为了优化误差折衷,我们研究了在整个算法的运行时,以自适应〜$ k $(即不同的$ k $)调查分布式SGD。我们首先设计了一种自适应策略,用于改变$ k $,该策略根据我们得出的墙壁通行时间的函数,基于上限的上限来优化这种权衡。然后,我们建议并实施一种基于统计启发式的自适应分布式SGD的算法。我们的结果表明,与非自适应实现相比,分布式SGD的自适应版本可以在更少的时间内达到较低的误差值。此外,结果还表明,自适应版本是沟通效率的,其中主人与工人之间所需的通信量小于非自适应版本的沟通量。
translated by 谷歌翻译
深度神经网络(DNN)已被广泛使用,并在计算机视觉和自动导航领域起着重要作用。但是,这些DNN在计算上是复杂的,并且在没有其他优化和自定义的情况下,它们在资源受限平台上的部署很困难。在本手稿中,我们描述了DNN体系结构的概述,并提出了降低计算复杂性的方法,以加速培训和推理速度,以使其适合具有低计算资源的边缘计算平台。
translated by 谷歌翻译
已经提出了高效和自适应计算机视觉系统以使计算机视觉任务,例如图像分类和对象检测,针对嵌入或移动设备进行了优化。这些解决方案最近的起源,专注于通过设计具有近似旋钮的自适应系统来优化模型(深神经网络,DNN)或系统。尽管最近的几项努力,但我们表明现有解决方案遭受了两个主要缺点。首先,系统不考虑模型的能量消耗,同时在制定要运行的模型的决定时。其次,由于其他共同居民工作负载,评估不考虑设备上的争用的实际情况。在这项工作中,我们提出了一种高效和自适应的视频对象检测系统,这是联合优化的精度,能量效率和延迟。底层Virtuoso是一个多分支执行内核,它能够在精度 - 能量 - 延迟轴上的不同运行点处运行,以及轻量级运行时调度程序,以选择最佳的执行分支以满足用户要求。要与Virtuoso相当比较,我们基准于15件最先进的或广泛使用的协议,包括更快的R-CNN(FRCNN),YOLO V3,SSD,培训台,SELSA,MEGA,REPP,FastAdapt和我们的内部FRCNN +,YOLO +,SSD +和高效+(我们的变体具有增强的手机效率)的自适应变体。通过这种全面的基准,Virtuoso对所有上述协议显示出优势,在NVIDIA Jetson Mobile GPU上的每一项效率水平上引领精度边界。具体而言,Virtuoso的准确性为63.9%,比一些流行的物体检测模型高于10%,51.1%,yolo为49.5%。
translated by 谷歌翻译
具有早期退出机制的最先进的神经网络通常需要大量的培训和微调,以通过低计算成本来实现良好的性能。我们提出了一种新颖的早期出口技术,基于样本的类手段,提前出口课程(E $^2 $ cm)。与大多数现有方案不同,E $^2 $ cm不需要基于梯度的内部分类器培训,并且不会通过任何方式修改基本网络。这使其对于低功率设备的神经网络培训特别有用,如无线边缘网络。我们评估了E $^2 $ cm的性能和间接费用,例如MobileNetV3,EdgisterNet,Resnet和数据集,例如CIFAR-100,Imagenet和KMNIST。我们的结果表明,鉴于固定的培训时间预算,与现有的早期退出机制相比,E $^2 $ cm的准确性更高。此外,如果培训时间预算没有限制,则可以将E $^2 $ cm与现有的早期退出计划相结合,以提高后者的性能,从而在计算成本和网络准确性之间取得更好的权衡。我们还表明,E $^2 $ cm可用于降低无监督学习任务中的计算成本。
translated by 谷歌翻译
Semantic segmentation works on the computer vision algorithm for assigning each pixel of an image into a class. The task of semantic segmentation should be performed with both accuracy and efficiency. Most of the existing deep FCNs yield to heavy computations and these networks are very power hungry, unsuitable for real-time applications on portable devices. This project analyzes current semantic segmentation models to explore the feasibility of applying these models for emergency response during catastrophic events. We compare the performance of real-time semantic segmentation models with non-real-time counterparts constrained by aerial images under oppositional settings. Furthermore, we train several models on the Flood-Net dataset, containing UAV images captured after Hurricane Harvey, and benchmark their execution on special classes such as flooded buildings vs. non-flooded buildings or flooded roads vs. non-flooded roads. In this project, we developed a real-time UNet based model and deployed that network on Jetson AGX Xavier module.
translated by 谷歌翻译
In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment the objective with a regularizer to address challenges associated with ill-posedness. The choice of a suitable regularizer is typically driven by prior domain information and computational considerations. Convex regularizers are attractive as they are endowed with certificates of optimality as well as the toolkit of convex analysis, but exhibit a computational scaling that makes them ill-suited beyond moderate-sized problem instances. On the other hand, nonconvex regularizers can often be deployed at scale, but do not enjoy the certification properties associated with convex regularizers. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what are the optimal regularizers, both convex and nonconvex, for data drawn from the distribution? What properties of a data source govern whether it is amenable to convex regularization? We address these questions for the class of continuous and positively homogenous regularizers for which convex and nonconvex regularizers correspond, respectively, to convex bodies and star bodies. By leveraging dual Brunn-Minkowski theory, we show that a radial function derived from a data distribution is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as $\Gamma$-convergence, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for semidefinite regularizers.
translated by 谷歌翻译
In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using $\widetilde{O}(d^2 \log \kappa)$ samples while tolerating a constant fraction of adversarial outliers. Here, $\kappa$ is the condition number of the target covariance matrix. The sample bound matches best non-private estimators in the dependence on the dimension (up to a polylogarithmic factor). We prove a new lower bound on differentially private covariance estimation to show that the dependence on the condition number $\kappa$ in the above sample bound is also tight. Prior to our work, only identifiability results (yielding inefficient super-polynomial time algorithms) were known for the problem. In the approximate DP setting, we give an efficient algorithm to estimate an unknown Gaussian distribution up to an arbitrarily tiny total variation error using $\widetilde{O}(d^2)$ samples while tolerating a constant fraction of adversarial outliers. Prior to our work, all efficient approximate DP algorithms incurred a super-quadratic sample cost or were not outlier-robust. For the special case of mean estimation, our algorithm achieves the optimal sample complexity of $\widetilde O(d)$, improving on a $\widetilde O(d^{1.5})$ bound from prior work. Our pure DP algorithm relies on a recursive private preconditioning subroutine that utilizes the recent work on private mean estimation [Hopkins et al., 2022]. Our approximate DP algorithms are based on a substantial upgrade of the method of stabilizing convex relaxations introduced in [Kothari et al., 2022].
translated by 谷歌翻译
Prototyping and validating hardware-software components, sub-systems and systems within the intelligent transportation system-of-systems framework requires a modular yet flexible and open-access ecosystem. This work presents our attempt towards developing such a comprehensive research and education ecosystem, called AutoDRIVE, for synergistically prototyping, simulating and deploying cyber-physical solutions pertaining to autonomous driving as well as smart city management. AutoDRIVE features both software as well as hardware-in-the-loop testing interfaces with openly accessible scaled vehicle and infrastructure components. The ecosystem is compatible with a variety of development frameworks, and supports both single and multi-agent paradigms through local as well as distributed computing. Most critically, AutoDRIVE is intended to be modularly expandable to explore emergent technologies, and this work highlights various complementary features and capabilities of the proposed ecosystem by demonstrating four such deployment use-cases: (i) autonomous parking using probabilistic robotics approach for mapping, localization, path planning and control; (ii) behavioral cloning using computer vision and deep imitation learning; (iii) intersection traversal using vehicle-to-vehicle communication and deep reinforcement learning; and (iv) smart city management using vehicle-to-infrastructure communication and internet-of-things.
translated by 谷歌翻译
Neural networks have revolutionized the area of artificial intelligence and introduced transformative applications to almost every scientific field and industry. However, this success comes at a great price; the energy requirements for training advanced models are unsustainable. One promising way to address this pressing issue is by developing low-energy neuromorphic hardware that directly supports the algorithm's requirements. The intrinsic non-volatility, non-linearity, and memory of spintronic devices make them appealing candidates for neuromorphic devices. Here we focus on the reservoir computing paradigm, a recurrent network with a simple training algorithm suitable for computation with spintronic devices since they can provide the properties of non-linearity and memory. We review technologies and methods for developing neuromorphic spintronic devices and conclude with critical open issues to address before such devices become widely used.
translated by 谷歌翻译
通过各种物体学习各种灵巧的操纵行为仍然是一个开放的巨大挑战。虽然政策学习方法为攻击此问题提供了强大的途径,但它们需要大量的每任务工程和算法调整。本文试图通过开发预先保证的灵巧操纵(PGDM)框架来逃避这些约束,从而在没有任何特定于任务的推理或超级参数调整的情况下会产生各种灵活的操纵行为。 PGD​​M的核心是一种众所周知的机器人构建体,即pre grasps(即用于对象相互作用的手工置序)。这种简单的原始性足以诱导有效的探索策略来获取复杂的灵巧操纵行为。为了详尽地验证这些主张,我们介绍了TCDM,这是根据多个对象和灵巧的操纵器定义的50个不同操纵任务的基准。 TCDM的任务是使用来自各种来源(动画师,人类行为等)的示例对象轨迹自动定义的,而无需任何执行任务工程和/或监督。我们的实验验证了PGDM的探索策略,该策略是由令人惊讶的简单成分(单个预抓姿势)引起的,与先前方法的性能相匹配,这些方法需要昂贵的每任意功能/奖励工程,专家监督和高参数调整。有关动画可视化,训练有素的策略和项目代码,请参阅:https://pregrasps.github.io/
translated by 谷歌翻译